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DETECTION OF SPATIAL CLUSTERING WITH AVERAGE 
LIKELIHOOD RATIO TEST STATISTICS 
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National University of Singapore 

Generalized likelihood ratio (GLR) test statistics are often used in 
the detection of spatial clustering in case-control and case-population 
datasets to check for a significantly large proportion of cases within 
some scanning window. The traditional spatial scan test statistic 
takes the supremum GLR value over all windows, whereas the average 
likelihood ratio (ALR) test statistic that we consider here takes an 
average of the GLR values. Numerical experiments in the literature 
and in this paper show that the ALR test statistic has more power 
compared to the spatial scan statistic. We develop in this paper ac- 
curate tail probability approximations of the ALR test statistic that 
allow us to by-pass computer intensive Monte Carlo procedures to 
estimate p-values. In models that adjust for covariates, these Monte 
Carlo evaluations require an initial fitting of parameters that can 
result in very biased p- value estimates. 

1. Introduction. The detection of local clustering in spatial point pro- 
cesses is of interest in epidemiological studies, forestry, geological studies, 
neural imaging and astronomy. There are a number of excellent texts and 
review papers on this, including [5, 13, 29]. A classical application that will 
be used here as an illustrative example is the identification of potential 
sources of environmental pollution that have contributed to higher rates of 
disease cases for residents living in their vicinity. 

Let T = {tj : 1 < i < I}, with tj £ R rf denoting the location of the ith case. 
We are interested in the presence of an unusually large number of cases near 
an unspecified location v = (vi,...,Vd) inside a bounded domain D. If T 
is generated from a process with known and constant intensity under the 
null hypothesis, we can test for the presence of clusters by computing the 
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maximal number of cases in the cubic windows I1a:=i i v k ~ > v k + ^r] > ° ver au 
v G Z) for a fixed window size w > 0. The question of whether this number 
is significantly large or may have occurred with reasonable chance under the 
null hypothesis was addressed in [21, 23], via asymptotic p- value calculations 
and p- value bounds. Extensions to weighted counting using kernel functions 
were also achieved in [27]. 

Rather than assuming that the underlying intensity is known and con- 
stant, we can assume instead that a control dataset U = {u.j : 1 < j < J — 1} 
is available for estimation of the possibly nonconstant intensity function. 
There has been considerable work done on the use of kernel functions to 
smooth U to provide an intensity estimate, and the significance of a clus- 
ter of cases is calculated by assuming that the estimated intensity is the 
true intensity (see, e.g., [1, 7, 9] and references therein). An alternative 
approach, as considered in [6, 26], is to merge T and U into a combined 
dataset X := {(tj, 1) : 1 < i < 1} U {(iij, 0) : 1 < j < J — 1} and rewrite it as 
{(xj, Xi) : 1 < i < J}. The SaTScan software developed by Kulldorff and In- 
formation Management Services Inc. [16] (see also [17]) considers merged 
datasets, with generalized likelihood ratio (GLR) test statistics used to pro- 
vide a score for each window, and the spatial scan statistic, the supremum 
GLR score used to determine significance. Instead of cubic windows, spher- 
ical windows C(y,w) := {t -J2k=i( v k ~ tk) 2 < w 2 } are considered. 

In Section 2, we consider the average likelihood ratio (ALR) test statistic, 
which uses an average rather than the supremum GLR score as the sum- 
mary test statistic. Numerical studies in the literature and in this paper show 
that the ALR test statistic has more power compared to the spatial scan 
test statistic. We provide moderate deviation tail probability approxima- 
tions in Section 2.1 for the ALR test statistic and illustrate their extensions 
to logistic regression models for covariate adjustments in Section 3. These 
p-value approximations allow us to avoid the use of computationally ex- 
pensive Monte Carlo methods and are especially important when covariate 
adjustments are required, as the Monte Carlo method currently in use re- 
quires an initial fitting of parameters that can result in very biased p- value 
estimates (see Examples 1 and 2 in Section 3.1). In Section 4, we perform 
comparison studies on real and simulated datasets. A discussion is provided 
in Section 5 followed by derivations of the asymptotic formulae in Section 
6. The appendices contain technical details and proofs. 

2. The spatial scan and ALR test statistics. Throughout this paper, we 
shall use || • || to denote the L2 norm of a vector. For any set A, vector t and 
real number b, we shall let t + bA = {t + ha. : a E A}. We shall use I to denote 
the indicator function and # to denote the number of elements in a finite 
set. For constants a n and b n , the notation a n ~ b n shall mean a n /b n — ► 1, 
while for random variables Y\, I2, • • • and Z\, Z2, ■ ■ ■ , the notation Y n ~ Z n 
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shall mean Y n /Z n A 1. We shall use Z to denote the set of integers and to 
denote the zero vector. We shall also adopt the conventions OlogO = and 
0° = 1. 

Let X = {(xi,Xi) : 1 < i < J}, where Xj denotes the location of the ith 
subject, while Xj = 1 if the subject is a case and Xj = otherwise. Condi- 
tioned on x := (xi, . . . , xj), the random vector X := (Xi, . . . , Xj) consists 
of independent Bernoulli random variables. Under the null hypothesis Hq of 
no clustering, there exists po £ (0, 1) such that 

(2.1) P {Xi = 1} = po for all l<i<J. 

Let B be a subset of R d and Hg the hypothesis that there exists p\ > p2 
such that 

P{X i = l\x i €B} = p 1 , 

(2.2) 

P{Xi = l| X j i B} = p 2 for all 1 < i < J. 

Let po = I / J be the maximum likelihood estimate (MLE) of po under Ho 
and let 

(2.3) 0(p) = p\og(^]+{l- p) log ■ 

\PoJ \l— Po/ 

Let m B = Y.i=\ I {x I es,x,=i} be the number of cases and n B = Y$=\ i {x,gB} 

the number of subjects in B. The log GLR score for testing Ho against H^ 
is 

S«(B):=log{ sup [ P T B (l-pi) nB - mB p I 2 - mB (l-p2) J ~ I ~ {nB ~ mB) ]} 

-log[^(l-p ) J - 7 ] 
= n B 4> 



\n B J \J-n B 



Vb/«b>»)' 



To detect both over- and under-clustering, we compare Ho against the 

(2) 

two-sided alternative hypothesis H B that (2.2) holds for some pi ^P2- The 
log GLR score is then 

(2.4) S^\B) := n B J^-) + (J - n^J 1 —^ 

\n B J \J-n B 



Let B be a finite class of measurable subsets of R , possibly dependent on x 
but not 
or 2, is 



(k) 

but not on X. The spatial scan statistic for testing Hq y s- U_bgb H B , k = 1 



(2.5) M B k) := sup S^(B). 
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The spatial scan statistic has the drawback of not making full use of in- 
formation provided by secondary clusters to conclude the presence of local 
clustering. For example, if there are scores S^iBx) >S^(B 2 ) for nonover- 
lapping windows B\ and B 2 both slightly smaller than the critical value, the 
information provided by S^ k \B 2 ) is not utilized in the decision not to reject 
Hq. Gangnon and Clayton [12] introduced the weighted ALR test statistic 

w B e si2){B) with ™ B >0 for all and ^ w B = 1. 

BeB BeB 

Unlike the spatial scan statistic, significance for the weighted ALR test 
statistic can be concluded based on many moderately large scores. The nu- 
merical studies in [12] suggest that the weighted ALR is more powerful than 
the spatial scan statistic in the detection of local clusters. Siegmund [28] also 
reports a closely related test statistic that is slightly more powerful, com- 
pared to the scan test statistic in a numerical study on the genome scan. This 
is in contrast to global clustering test statistics like (#£>) _1 SseB ^ C^)j 
which are expected to have lower power compared to the spatial scan statistic 
when only a few local clusters are present (see [18] for supporting numerical 
results) . We consider in this paper p- value approximations for the (log) ALR 
test statistic 

(2-6) ^:=21og((#er 1 E^ )(B) )- 

^ BeB ' 

An extension of these approximations to weighted ALR test statistics is 
given in the appendices of [4]. 

2.1. Moderate deviation tail probabilities. In this paper, we provide tail 
approximations of the ALR test statistics under the following assumptions. 
(Al) The domain D is a compact subset of R d and satisfies 

#{t G (eZ) d : t + [0, e] d CD}~ #{t G (eZ) d : (t + [0, e] d ) nfl/0}~ \D\/e d 

as e — ► 0. 

(A2) The locations xi , . . . , xj are independent and identically distributed 
(i.i.d.) random vectors generated from A, a continuous and positive density 
on D. 

(A3) The class of scanning sets B is a sub-class of C := {v + wA:v G 
D : wq < w < wi}, where A is a convex, open and bounded subset of R d , 
with G A and < w < u>i < (\D\/\A\) 1 / d . 

In Theorem 1 below and Theorem 2 in Section 3, B(= B c ) may vary with 
the critical value c and constraints are placed only on the growth of J (for 
Theorem 1) and #B with respect to c. The class of C of candidate scanning 
sets is, however, fixed for all c > 0. The proofs of the theorems use change 
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of measure arguments and linearization techniques developed by Lai and 
Siegmund [19, 20] and Woodroofe [32, 33], to analyze GLR test statistics in 
sequential analysis and are given in Section 5. A motivation of the proofs 
is also given by a simpler Theorem 3 and its proof in Appendix A. Let x1 
denote a chi-square random variable with one degree of freedom. 

Theorem 1. Assume (A1)-(A3) and let (2.1) hold for some <po < 1- 
Let\og(#B) = o(c 1 / 3 ) and assume that c ~ kJ s for some k > and < s < 1. 
Then as c — > oo, 

(2.7) P {U ( B k) > c|x} ~ kP{xl > c}/2 fork = 1,2. 

The assumptions (A2), (A3) and the relation c~ kJ s in the statement 
of Theorem 1 are needed to ensure that the number of subjects in each 
B £ B approaches infinity fast enough for a chi-square tail probability ap- 
proximation of S^(B) to hold. This leads to the chi-square tail probability 

(k) 

approximation of £7g . The uniform approximation when conditioning on x 
in (2.7) ensures that we do not reject Hq unevenly with respect to the con- 
figuration of the locations. However, it is also important for us to check the 
actual type I error probability when x is not conditioned on (see Example 
2 in Section 3.1). 

3. Logistic modeling. To see why (2.7) extends to more complicated 
models, it is useful to view it as resulting from two different asymptotics. 
Let Xb = Jb -Mt) dt, where A is the density in (A2). Let u be Gaussian white 
noise with uj(B) ~ iV(0, A#) for B C D and uj(A),uj(B) independent when- 
ever A and B are disjoint. Let Z B = A B 1/2 (1 - X B )~ 1/2 [u(B) - A B w(D)]. 
The first asymptotic is a weak convergence of S&(B) to Z 2 B /2 uniformly 
over B EC, and this holds largely because inf B gc(^s/c) — > oo when c ~ kJ s 
for < s < 1. The second asymptotic is like (2.7) [see (3.2) below], but with 

ALRs Ug^ and replaced by 



(3.1) 



U^:=2\og({#B)- 1 zZ^ 112 ) ^d 
(1) .-O^J (41K\-1 ST e z %J 2 



U§> :=2lo g ((#B)- 1 zZ e B 

^ BeB 

respectively, where Z B + = max{ZB,0}. 

Theorem 2. Assume (Al), (A3) and let log(#B) = o(c 1 / 3 ). Then as 
c — > oo, 

(3.2) P{ljf > c} ~ kP{xl > c}/2 for k = 1, 2. 
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Since {U^ - U B 2) \ < 2sup BeB \S<$(B) - Z%/2\ and \U$ - U^ ] \ 
sup Bg23 \S^ (B) — Z B+ /2\ , the combination of the two asymptotics described 

(k) 

above provides us with chi-square tail approximations for U& . 

Consider more generally datasets containing additional information like 
the age, sex, diet and smoking habits of the subjects. These covariates 
may influence the outcome and, hence, we may have to correct for spa- 
tial imbalances of these covariates when testing for spatial clustering. Let 
Ui = (un, . . . ,Ui r )' be the covariate vector of the ith subject, with un = 1 
denoting the intercept term, and let pi = P{Xi = l|xj,Uj}. Consider the 
logistic model 

(3.3) Pl = (l + e -P'^)-\ 

where (3 = . . . ,/3 r )' is a nuisance parameter vector. Under the null hy- 
pothesis Hq of no clustering, 9i = for all i, while under the one-sided 
alternative hypothesis H B , 9i = 9I^ x . eB y for some 9 > 0. Under the two- 
sided alternative hypothesis H B , 9i = 9I^ XtGB y for some 9^0. Let (3 be 



the MLE of under H and ((3 E V , 9 K B >) the MLE of (/?, 9) under H Q U H£ 
Define 



ft = (1 + e-^T\ P$ = (1 + e'^'^fK^B) 

y«=^(f) + (i-^^). 



(3.4) 



,B 

Then the ALR test statistics are 



(3.5) vf = 2 log (#*)-* V e s(k) ^ where 5«(fl) = 



./ 



BeB 

The scores S^ k \B) are asymptotically chi-square, even when j3 is infinite 
dimensional (see [2, 22] and references therein). The efficient score expan- 
sions of the log profile likelihoods that are used for deriving these chi-square 
approximations can also be used to provide the covariance structure of the 
limiting multivariate normal of y/n8 B over B € B, and this structure de- 
pends on the nuisance parameter under H (see Appendix B for more de- 

(k) 

tails). However, the chi-square approximations of U B in the moderate de- 
viations domain do not depend on the covariance structure of the limiting 
multivariate normal. In other words, 

(3.6) P(oM U B ] > c i ~ kp ixl > c}/2 for k = 1,2 

uniformly over compact sets of (3 (see Appendix B). 



DETECTION OF SPATIAL CLUSTERING 



7 



This is desirable because the p-value is in principle computed from the 
worst-case scenario under Hq. In this respect, the ALR test statistic shares 
the same uniform asymptotics as the GLR test statistic for a composite null 
hypothesis versus single composite alternative hypothesis with a dimension 
difference of one, differing only in that for the GLR test statistic, the ap- 
proximation occurs in the central limit domain as well. The spatial scan test 
statistic does not have such uniform asymptotics over nuisance parameters. 
Hence Theorems 1 and 2 are not just devices for p-value approximations, 
but also theoretical results that provide understanding of the asymptotic 
properties of the ALR test statistic. To reduce computational time for large 
datasets, we can avoid searching for a new (Pg ,9g ) for each B G B by re- 
placing (B) by a first-order quadratic approximation (see either (4)-(6) 
of [22] or (B.l) in Appendix B). 

3.1. Monte Carlo evaluation of conditional p-values. Under (2.1), the 

conditional p- value Pq{M s > c|/,x} does not depend on po and can be 
evaluated by a permutation test. Permutation tests are nonparametric tests 
that compute p-values from permutations of the observations X\ , . . . , Xj , 
which are often assumed to be i.i.d. under the null hypothesis. In principle, 
the p-value is the fraction of permutations with values of test statistics at 
least as large as the original test statistic, though in practice the number 
of permutations is usually too large for direct computations, and Monte 
Carlo methods are used instead to sample a random subset of permuta- 
tions for p- value estimation (for more details, see [10, 11]). In the SaTScan 
software, users are prompted to select L = 99, 999 or 9999 random per- 
mutations. For each 1 < I < L, compute from {(xi,Xa) : 1 < i < J}, 
where (Xn, . . . , Xji) is a random permutation of (Xi, . . . , Xj). Then the es- 
timated conditional p- value is (1 + J2e=i Ir/\r( fe K + L). The extension 

of the method to estimate Pq{U^ > c|/,x} is straightforward. 

When covariates are present, the SaTScan software uses the following 
Monte Carlo procedure, as advocated in [15]. Assume that there are rij 
subjects at location Vj for 1 < j < q, with nj large. Fit (3.3) under the null 
hypothesis Hq, that there are no spatial effects, that is with 9i = for all i. 
The fitted value pi, given in (3.4), is the estimated risk of the ith subject. At 
each Vj, estimate the total risk by r)j = Si :Xi =VjPi- m j = Si:xi=Vj -^Q> 
tub = J2v j eB m j an d i]B = J2vj£B Vj- Assume that under Hq, mi,...,m q 
are independent Poisson random variables with respective means rji , . . . , rj q . 
Then conditioned on m\ H — \-m q = I, the adjusted spatial scan statistic for 
testing Hq against U_BeB ^b ls 

M B 2) := sup S (2) (B) where 
BeB 
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Table 1 

Comparison of the type I error probabilities and detection powers of M B and U B at 
significance levels a = 0.05 and a = 0.01 with 1000 independent copies o/X 








= 0.05 




= 0.01 


~(2) 

MC: M B 


ALR: [7 S 2) 


~(2) 

MC: M B 


ALR: U B 2) 





0.026 


0.048 


0.004 


0.008 


0.2 


0.088 


0.158 


0.021 


0.054 


0.4 


0.367 


0.499 


0.137 


0.261 


0.6 


0.740 


0.849 


0.506 


0.676 



(3.7) 

{B ) := m B log ( ?») + ( I - m B ) log ( . 

\VB J \I -rjB J 

To simulate the Monte Carlo p- value for each 1 < £ < L, where L is the 
required number of simulation runs, generate (roy,...,m 3 |) from a multi- 
nomial distribution with / trials and success probabilities (rji/I, ■ ■ ■ ,rj q /I), 

then compute Mq ' e using (3.7). The estimated p- value is then 

(i+|v«i<'>)/ (1+L) - 

Example 1. Let D be a union of disjoint sets B\, B2 and ^3, each 
containing 1000 subjects. Generate dummy covariates ttj ~ iV(0, 1) if Xj G 
B2 U i?3 and U{ ~ N(l, 1) if Xj G B±, then keep them fixed for the remaining 
part of this exercise. Let B = {-Bi, B2, B3} and let 

(3.8) P{Xi = llxi.tii} = (1 + e-^-^i^i})" 1 . 

In our comparison study, we generate X = (X±, . . . ,^3000) from (3.8) with 

— (2) 

p\ = —3, 6 > and compute the Monte Carlo p-values of Mg with L = 999 
simulation runs and also the p- values of Uq using chi-square tail probability 
approximations. The scores S^ 2 \B) are computed from (3.4)-(3.5) with m, 
the only covariate of the ith subject. For each 9 > 0, the above procedure is 
repeated 1000 times, each time with a different copy of X. The estimated 
type I error probabilities and power are summarized in Table 1. We see that 
the Monte Carlo risk adjustment method provides very conservative p- values 
(see [3] for alternative strategies to deal with this drawback). 

Example 2. We choose a slightly different design here to check the type 

(2) 

I error probability and power P{Ug > c} (without conditioning on x). In 
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Table 2 

Comparison of the type I error probabilities and detection powers of M B and U B at 
significance levels a = 0.05 and a = 0.01 with 1000 simulation runs 



Pi 


cx 


= 0.05 


a 


= 0.01 


~(2) 

MC: M B 


ALR: [7 S 2) 


~(2) 

MC: M B 


ALR: U B 2) 


0.05 


0.026 


0.053 





0.007 


0.2 


0.054 


0.119 


0.008 


0.026 


0.4 


0.243 


0.395 


0.111 


0.209 


0.6 


0.514 


0.682 


0.327 


0.484 



each simulation run, twenty locations vi,...,V2o are generated uniformly 
and randomly on the unit square [0,1] 2 . Let Co be a circle of radius 0.3, 
centered at (0.5,0.5). Fifty subjects are located at each Vj, each of them 
generated as a case with probability po = 0.05 if Vj ^ Co, and generated 
as a case with probability p\ > po if Vj G Co- Each subject at v» is given a 
dummy covariate distributed as N(0, 1) if Vj ^ Co and distributed as N(l, 1) 
if Vj £ Co- For each 1 < i < 20, let = r^i < • • • < r^o be the ordered values 
of || Vj — Vj|| for j = 1, . . . , 20. We consider the class of scanning sets 

B = {C(yi,r id ) : 1 < i < 20, 1 < j < 10}, 

where C(v, r) is a circle of radius r, centered at v. One thousand simulation 
runs are used to estimate each type I error probability and power of the 

- — -(2) 

adjusted scan statistic ( using L = 999 permutations) and the ALR test 

(2) 

statistic Uq (using the chi-square distribution) (see Table 2). We see that 
the Monte Carlo method has low type I error probability and corresponding 
loss of power when compared against the ALR test statistic. 

4. Numerical examples. We analyze a case-control dataset in Section 
4.1, a case-population dataset in Section 4.2 and various simulated datasets 
in Section 4.3. 

4.1. Laryngeal cancer dataset. This dataset consists of: (i) the locations 
of 58 cases of laryngeal cancer occurring in two districts in Lancashire for 
the period 1974-1985; and (ii) the locations of 978 control cases of lung 
cancer for the same period and districts in the domain D = [34500, 36500] x 
[41100, 43100] (see [8] for more background). A key feature is a cluster of four 
laryngeal cancer cases (see the bottom of the left plot of Figure 1) located 
near an industrial waste incinerator, which is considered a potential source 
of the cluster of laryngeal cancer cases. We want to test for the presence 
of local clusters without biasing ourselves a priori with information on the 
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possible sources of the laryngeal cancer cases. As the location co-ordinates 
in the datasets are rounded to the nearest tens, we consider the covering 
sets 

B w := {C(v, io):veDn (10Z + 5) 2 , n c{y , w ) > 2} 

with radii w = 40, 50, 60 and 70. Hence each circle in B w contains at least 
two subjects and has a center v with co-ordinates ending with 5 and lies 
inside D. Express Mg and ujp more simply as Mi and Uw \ respectively. 

In Table 3, we tabulate Monte Carlo conditional p-values of both m4 
and Uw using the permutation method described in Section 3.1. We observe 
that for both the spatial scan and ALR test statistics, p-values below 0.02 
are obtained when w = 40. This is in contrast to p-values of 0.08 to 0.8 
obtained using kernel-based methods (see [1]). The choice of window size 

w affects the p- value substantially when using Mi, , and this is also true 
when using kernel-based methods. In contrast, the influence of window size 
on the p-values of Uw is much smaller. In this sense, the ALR test statistic 
is more robust against misspecification of cluster shape and size, that is, 
when is true for some B ^ B, because under such a situation there will 
often be many windows having moderately large scores, and this will aid the 
rejection of Hq. The construction of Table 3 requires a substantial amount 
of computation as there are more than 5000 scanning sets in each B w . 

A numerical power study (see Table 4) indicates that the ALR and spatial 
scan test statistics do not dominate each other when there is only one source 
of spatial clustering. In this study, we fix the locations x and the total 
number of cases / = 58. Consider a circle with radius 40 and let n be the 
number of points in it. Let p be the probability that a point in the circle is 



8°o 



o o ° 



34600 35000 35400 35800 

x co-ordinate 




34600 35000 35400 35800 

x co-ordinate 



Fig. 1. Scatter plots of the 58 laryngeal cancer cases (left) and the 978 lung cancer cases 
(right). 
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Table 3 

Numerical values of the test statistics and Monte Carlo conditional p-value 
estimates ± standard error for B w with 2000 simulation runs for each spatial scan 
statistic p-value and 10,000 runs for each ALR test statistic p-value 



w 


Spatial 


scan statistic .M™ ' 


ALR test statistic 


Value 


MC p-val. (cond.) 


Value 


MC p-val. (cond.) 


40 


9.21 


0.016 ±0.003 


5.29 


0.0104 ±0.0010 


50 


7.95 


0.090 ± 0.006 


4.47 


0.0137 ±0.0012 


60 


7.95 


0.078 ± 0.006 


4.07 


0.0200 ± 0.0014 


70 


7.95 


0.079 ± 0.006 


3.89 


0.0213 ±0.0014 



simulated and p the probability that a point outside the circle is 

simulated as a case. Thus the relative ratio (RR) is p/p. The numbers p and 
p are determined from the constraint 

np + (1036 — n)p = 58. 

In the £th simulation run, 1 < I < 1000, we generate {Xn :l<i< 1036} with 
success probabilities p (for Xj inside circle) or p (for Xj outside circle), and 
repeat until a total of 58 cases is observed before proceeding to compute 
\ and M^g. The estimated power is the proportion of runs in which the 
critical value is equaled or exceeded. 

We also try out scanning sets with different radii at different centers as 
suggested by a referee, and obtain similar p-values for the spatial scan and 
ALR test statistics (see Table 5). The classes of scanning sets considered 
here are of the form 

Bj = {C(xi,rij) :l<i< 1036} for j = 5, 6, 7, 

where = rn < r{2 < • • • are the ordered values of ||xj — x^H for 1 < k < 1036. 
It is interesting to note that even though the largest window score of 9.21, 

Table 4 

Powers of U$ and based on 1000 simulation runs for each entry, with estimated 

1% critical values Cm, o.oi = 9.49 and c^o.oi = 5.29. In each row, the n points lying in a 
circle centered at {vi,V2) with radius w = 40 are simulated as cases with probability RR 
times larger than points lying outside the circle 



vi v-2 n RR Power of Power of M 



35565 


41395 


6 


12 


0.49 ±0.02 


0.36 ±0.02 


35195 


42745 


9 


11 


0.54 ±0.02 


0.54 ±0.02 


35515 


42255 


12 


10 


0.52 ±0.02 


0.68 ±0.01 


35255 


42155 


15 


8 


0.45 ±0.02 


0.47 ±0.02 


35595 


42745 


18 


7 


0.47 ±0.02 


0.42 ±0.02 
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Table 5 

Numerical values of the test statistics and Monte Carlo conditional p-value 
estimates ± standard error with 2000 simulation runs in each entry for scanning sets with 

different radii 



3 


Spatial 


scan statistic M ^ 


ALR test statistic C/ (1) 


Value 


MC p-val. (cond.) 


Value 


MC p-val. (cond.) 


5 


9.21 


0.016 ±0.003 


5.38 


0.012 ±0.002 


6 


7.95 


0.043 ±0.005 


5.76 


0.006 ± 0.002 


7 


7.04 


0.079 ±0.006 


3.70 


0.027 ±0.004 



obtained from a scanning set containing four cases and one control, is missed 
when j = 6, the ALR score actually increased. 

4.2. New York leukaemia dataset. We use here an updated version of the 
dataset presented in [31], which tracks leukaemia occurrences in 281 census 
tracts in New York state. Let Vj denote the centroid of the jth census tract 
and let rrij and rij be the number of leukaemia cases and population size, 
respectively, at Vj. Let mg = S V; /eB m i' n B = J2 Vj eB n j} I = Y^j=i m j an d 
J = Yf=inj- Gan gnon and Clayton [12] considered the ALR test statistic 
[4 2) with 

B = {C7(vi,r y ) :0 < r i3 < 20, 1 < % < 281, \<j< 281}, 

where r%j = ||vj — Vj||. We plot in Figure 2 simulated values of Ug under 
the null hypothesis (2.1) with po = 5 x 10~ 4 (= I/J), against quantiles of the 
chi-square distribution with one degree of freedom and also against quantiles 
of a distribution function G satisfying 

(4.1) V ^ ' 

for x > xq with xq = 0.42 satisfying 2e x ° /(ttxq) = 1. 

The upper tail probabilities of G are expressions often seen in large devia- 
tions saddlepoint approximations. 

Since P{x\ > x} < 1 - G(x) for all x > and P{x\ > x} ~ 1 - G(x) as 

(k) 

x — > oo , p-value estimates of Uq obtained by comparing against G instead of 
the chi-square distribution are slightly more conservative for small p- values. 
From the qq-plots, we see that G provides a good fit over a wider range 
of values but for small p-values, which are of primary interest, the p-value 
estimates are comparable. 
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4.3. Simulated datasets. The example in Section 4.2 is typical for ap- 
plication of cluster detection methodology. More than 20,000 circles were 
created from comparisons among the 281 census tracts. For larger number 
of census tracts, the number of circles can easily run into the millions. The 
computational burden is quite serious if say L = 999 or 9999 Monte Carlo 
simulation runs are used to evaluate p- values. Small p- values are of statis- 
tical interest, yet it is precisely for these cases that Monte Carlo methods 
are less reliable. If a person is looking at multiple regions, end-points or 
time-points, nominal p-values much smaller than 0.01 may be required for 
significance to be declared. For probability 0.05, L = 999 runs will give us 
relative error of about 0.15, while the corresponding relative error is about 
0.3 for a probability 0.01. In Example 3 below, we compare the analytical 
chi-square and G tail approximations [see (4.1)] of the ALR for two different 
arrangements of scanning sets. The key advantage of the analytical approx- 
imations lies in composite null situations for which the usual Monte Carlo 
methods may not work well (see Section 3). 

Example 3. Let Vi, . . . , v„ be generated uniformly from the unit square 
[0,1] 2 , and let 

i3i = {C(vi,rij) :0 < rjj < wi,l <i <n, 1 < j <n} 

(4.2) 

where = ||vj — Vj-||. 

We shall abuse notation here and denote #{i : v,; G C} more simply by #C. 

For each 1 < i < L with L large, generate independent standard normal 
random variables Yn, . . . , Y n £ and define 

^CA = — N , = where Y p = n > Yip. 

v/(#C)[l-(#C7)/nj ^ 
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Let U%\ = 21og((#£)- 1 J2c&B e z ci/ 2 ). In Figure 3, we plot ordered values of 

(2) 

U% i against quantiles of both the chi-square and G distributions for w\ = 0.2 
and various values of n. Approximately six hours of computer time were 
taken up to generate the plot for n = 1000. The plots show the G distribution 
to be more suitable for estimating moderately small p-values. For smaller 
p-values, the chi-square and G distributions give similar approximations. 
Similar plots are obtained when experimenting with w\ = 0.3. 

5. Discussion. The New York leukaemia dataset in Section 4.2 is a typi- 
cal dataset in which the locations are concentrated on a number of geograph- 
ical centers instead of spreading over a domain D, and strictly speaking, the 
positive density assumption [see (A2)] does not hold. However, the purpose 
of the assumption is to ensure that the number of subjects in each scanning 
set goes to infinity at a fast enough rate, and as this is satisfied in this sit- 
uation, the chi-square approximation is still valid. Similarly, the restriction 
that the class of sets in (A3) has to be all of the same shape can be relaxed in 
these types of datasets. The relaxation allows us to deal with the detection 
of irregular shaped clusters considered in, for example, [25, 30]. The condi- 
tion that B be dependent only on the locations x.; and not on the responses 
Xi is, however, necessary for the chi-square approximation to hold. 

The qq-plots in Figures 2 and 3 show that the p-value approximations 
are inaccurate for small thresholds. This is consistent with the conditions of 
Theorem 1, which says that moderate or larger values of the threshold are 
needed for the p- value approximations to be accurate. This is not a problem 
because when large p-values are encountered, it suffices to state that the 
p- value is larger than a specified significance level. For very large thresholds, 
the difference of the approximated and empirical quantiles is due to the 
inaccuracy of the Monte Carlo method. Though Theorem 1 is stated only in 
terms of approximating unconditional p-values, a rough calculation shows 
that the chi-square approximation on the four conditional p-values of the 
ALR test statistics in Table 3 has the accuracy of about 4000 simulation 
runs. The chi-square approximations are also within one standard error of 
the Monte Carlo p-values in Table 5. 

The overfitting of nuisance parameters when using Monte Carlo meth- 
ods for p-value estimation of the spatial scan statistic was mentioned by 
Neill, Moore and Cooper [24], and this phenomenon likely contributed to 
the conservative p- values seen in Examples 1 and 2. The authors provided 
convincing arguments for why quick detection of disease outbreaks is impor- 
tant and cited the need to perform time-consuming Monte Carlo or boot- 
strap replications to provide reliable p-values of the spatial scan statistic 
as one justification for developing alternative methodologies. In this paper, 
we stick to the method of detection cluster via GLR values (but taking 
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5 10 15 

chisq quantiles 




5 10 

chisq quantiles 



5 10 15 

G quantiles 




5 10 

G quantiles 



Fig. 3. Qq-plots of U z against the chi-square and G distributions for Bi with n — 10, 
100, 1000 locations and maximum radius wi = 0.2. 
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averages instead of maximums) popularized by the SaTScan software and 
address its drawbacks by providing accurate and easy to compute p-value 
approximations. These tail probability approximations can be applied even 
when nuisance parameters are in the model, and they enhance the attrac- 
tiveness of the GLR method by easing its use. 

6. Proofs. 



6.1. Proof of Theorem 1. Let < 70 < po < 71 < 1- Then by large devia- 
tions, 

(6.1) P {p < 70} + PoiPo > 71} = o(c- 1/2 e- c / 2 ), 
while by the law of large numbers, we may assume that 

(6.2) liminf inf (n B /J) > 0. 

For each J70 < I < J71, let (pb,Pb) be the roots (p,p) of 
n B p+ (J -n B )p = I, 

n B <p(p) + (J — n B )4>(p) = c/2 with p> p . 

Under (6.2), (p B ,p B ) exists and are unique for all B £ B when J is large. 

For given values of po and x, let Q B be a probability measure under which 
X\ , Xj are independent Bernoulli random variables satisfying 

(6.4) Q B {X t = l\ Xl eB}=p B , Q B {X i = l\x i <£B}=p B . 

Let 0(p) =log(p/pb) - log[(l-p)/(l-pb)]. Then by (2.3) and (6.3), 

~dQ B 



1(B) := log 



ur po 



{X) 



[e{p B )x i+ io g {^ 



l-PB 



(6.5) 



x,eB 



+ E {0(i>B)Xi + ]ag(j 



Po 

l-PB 



PO 



= c/2 + 9{p B ) E -Pb) + 0(pb) E -Pb)- 

The following supporting lemmas hold uniformly over 70 < Po < 71 under 
the conditions of Theorem 1. The proof of Lemma 1 is given in Appendix 
C, while the proof of Lemma 2(a) uses arguments in the proof of (6.11) 
which is also given in Appendix C. The proof of Lemma 2(b) is relatively 
straightforward and thus omitted. 
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Lemma 1. Assume (6.2). 

(a) SW(B) > 1(B) for all B£B. 

(b) There exists rj c — > as c — > oo such that 

\S (1 \B) -£(B)\ <r] c whenever \S W (B) - c/2| <c 1/3 . 

Lemma 2. (a) Let 

V B = log{ (dQc/dQ B )(X)\ = log(]T e^)"^ 
'-CeB ' ^CeB 

Then whenever c < Ug < c+ c 1 / 3 , 



+ - log(#B) = log (MB)- 1 E e ' (C) = ^ 72 + 

(b) Qs{E 4 J =i *i = ^|x} ~ Pp {ELi *i = uniformly over BeB. 

We shall now provide the key arguments in the proof of Theorem 1. 
Let -B max maximizes Sx-es(^ ~~ Pb) over B £ B, with an arbitrary or- 
dering imposed on B to break ties. Under Qb, conditioned on c < < 
c + c 1/3 and B 

max — B, C-(B) has an asymptotic density (2ttc) 1//2 on the 
interval ( c -c 1/3 ; c +^ 1/3 ) ; an d is asymptotically independent of both Vb and 
I{B max =B}- The random variable Vb summarizes information on the local 
fluctuations of the GLR values for sets near B when B max = B, and its value 
is determined chiefly by a small set of Xj with Xj near the boundary of B, 
because under Qb, e^ c ^~ i<yB ^ is small for C, far from B. Similarly, I| Smax=B j 
is determined by the values of Xj with Xj located near the boundary of B. 
The test statistic i(B), on the other hand, is asymptotically N(c/2,c) un- 
der Qb and is asymptotically independent of any small set of Xj. We thus 
obtain formally, for 70 < po < 71 , 

Pf, {U^>c\I, X } 

= E Ppoi^ > C ^max = B\I,Jc} 
BeB 

BeB e 

(6.6) ~ E Q(B){E[e- e{B h {m > c/2 -v B+ iog(m,B m , x =B}\VB}\I,x) 

BeB 

E E Q{B) ( I {B m ax=B} / , (2nc)- 1 / 2 e-ydy 



BeB 
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= (2vrc)- 1 /2 e -/2 (#S) -i J2 E Q(B) (e v n {Binax=B} \I,x) 



BeB 



(2vr C )- 1 / 2 e- c / 2 (#6)- 1 Y, E Qc{B max = B\I,x}. 
BeBCeB 



We then switch the summation signs in the last line of (6.6) to show that 
Pfo{UlP > c|i",x} ~ (2vrc)- 1 / 2 e- c / 2 , and (2.7) for k = 1 then follow from (6.1), 
By (6.5), Pp (A\x) = ^Q(B)(e^ (B) lA|x), where ^ = {U^ > c,£? max = B, 
^2i =1 Xi = I}, and the relation between the first and second lines of (6.6) 
follows from Lemma 2(b). For additional details on (6.6), see Appendix C. 

Since S^ 2 \B) = (D\B), andpo lies between m B /n B and mmB/ n D\Bi 
it follows that 

(6.7) e S (2) (B) = e SW(B) + e SW(D\B) _ L 

Let B = {D\ B : B G B} and uj» g = 21og([2(#£)]- 1 E B eBuB e SW{B) )- Then 
by the arguments leading to (2.7) for k = 1, 



^{^k > c- 2 log 2} ~ [2vr(c - 21og2)]- 1 / 2 e"( c " 21 ° s2 )/ 2 

(6r 



~ [2/(vrc)] 1 / 2 e - c / 2 . 

By (6.7), C4 2) = 21og((#B)- 1 EB GB e 5(2) ^)) = + 21og2 + o(1) when 
Ug > c, and hence (2.7) for k = 2 follows from (6.8). 

6.2. Proof of Theorem 2. Let £ = {pbc)b,c&B be the covariance matrix 
of Z = (Zs)_bgb, a multivariate normal with -EZb = and Var(Z B ) = 1 for 
all B G B under probability measure P. Hence p B c = Asnc — A^Ac*. Fix c > 
0, B G B and let Q_b be a probability measure under which 
cj(A) ~ iV(A B 1/2 (l - A b ) 1 / 2 Aac 1 / 2 , Aa) for icBand w(^) ~ 7V(-A£ /2 (1 - 
A jB ) _1/2 Aac 1/2 , Aa) for Ac D\B, with cj(A),o;(C) independent when A 
and C are disjoint sets. Under Q B , Z is multivariate normal with covari- 
ance matrix £ and Eqi b \Zc = c 1 / 2 p B c for all C G £>. Moreover, 



(6.9) £(£):= log 



J' 2 Z B -c/2. 



We next use a linearization argument to justify the replacement of Z B+ /2 
in the expression of U { z ] by £(£). By convexity, Z% + /2 > £(B) for all BeB, 
with equality when Z B+ = c 1 / 2 . By a Taylor expansion, 

(6.10) sup \l{B)-Z 2 B+ /2\ =0(c~ 1/3 ) asc^oo. 

Z B :|Z| + - C |<2ci/3 
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Let V B = log{J2ceB(dQc/dQ B )(Z)} = log(EceB^ {C) ~ m )- Then by 
(6.10), there exists Cc = 0(c~ 1 ^) such that whenever c < < c + c 1//3 , 



(6.11) 



4 X) /2 - Cc < £{B) + V B - log(#B) 

<4 1} A 



logj^)- 1 ^ 



£(C) 



(see Appendix C). We then apply the steps in (6.6), without the conditioning 
on I and x, to obtain the tail probabilities of . For extensions to the tail 

(2) 

probabilities of U z , apply the arguments in the last paragraph of Section 
6.1. 

APPENDIX A: THEOREM 3 AND ITS PROOF 

Theorem 3. Let Si c , ■ • • , S nc be random variables and assume that there 
exists a constant K > and random variables Y k j such that P{Y k j = 0} = 
for all k j and as c — > oo, 

(A.l) P{S kc >c + y}~Kc- l / 2 e- c -y, 

while conditioned on S kc > c + y, 

(A. 2) (S kc — Sic, ■ ■ ■ , Skc — S nc ) =>■ (Yjfci, . . . , Y kn ) 

for all 1 <k <n and y G R. Then 



(A.3) 

and 
(A.4) 



n 
k=l 



H e ^ ) l {Y kj >0 for all j} 
Vj=l / 



1 



as c ^ oo. 



i=i 



Proof. Let M c = sup 1<fe<n S kc . For a given e > 0, let = y\ < • • • < y n 
be such that P{Y k j = y r } = for all 1 < r < m, k ^ j and sup 1<r<m (e _?/r - 
e~ Vr+1 ) < e, where y m +i = oo. Then by (A.l) and (A. 2), for all k~^j, 

P{S jc >c,M c = S kc } 



= ^2 P{S jc > c, M c = S kc , y r < S kc - S jc < y r+ i} 

r=l 
m 

(A.5) <^P{S kc >c + y r ,M c = S kc ,y r < S kc - S jc < y r+1 } 



r=l 
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m 

~ Kc^l 2 e~ c ^ Vr P{Y kl > for all i, y r < Y kj < y r+1 } 
r=l 

< Kc-V 2 e- c E[(e- Y V + e)I { y fci > for all i}]- 

Similarly, 

P{S jc >c,M c = S kc } 

m 

(A.6) > (K + o(l))c~ 1 / 2 e _c ^2 e~ Vr+1 P{Y ki > for all i,y r < Y kj < y r+1 } 

r=l 

>(K + o(l))c^ 2 e- c E[(e- Y ^ - e)l {Ykl >o for all <}]■ 

By selecting e arbitrarily small, it follows from (A. 5) and (A.6) that 

(A.7) P{S jc > c, M c = S kc } ~ Kc~ l l 2 e~ c E{e~ Y ^l {Yk ^ for all 

The asymptotic relation (A.7) also holds for k = j by applying (A.l) and 
(A. 2) for y = 0, noting that Yjj is a zero-valued random variable for all 
j. We then add up (A.7) over 1 < j < n, 1 < k < n and compare against 
the asymptotic relation 2~Zj=i P{Sjc > c} ~ i^nc _1 / 2 e _c , which follows from 
(A.l), to obtain (A.3). 

Since log(n~ 1 £™ =1 e s ^~ s ^) < when M c = S kc , by (A.l) and (A.2), 

i 3 |n- 1 ^e^ c >e c ,M c = < S fcc | 



(A.8) = P j 5 fcc > c - log 1 £ e S,c-S*c j ; Mc = 5fec 



n-^e-^w jl 



{>fcj>0 for all j} 



and (A. 4) follows from (A.3) by adding (A.8) over 1 < k < n. To show the 
last relation in (A.8), we use a discretization argument described earlier. 
Given any e > 0, select = yi < • • • < y m such that P{ — log^^ 1 Y^j=x e~ Yk: >) 
= Ur, Y k j > for all j} = for all 1 < r < m, 1 < k < n and sup 1<r . <m (e _3/r — 
e~ Vr+1 ) < e, with y m +i = oo. We then express asymptotic upper and lower 
bounds of 



Pjs^c-logj^- 1 ^, 



-Skc 



M c = S kc , y r < - log [n' 1 e s ^~ Skc j < y r+ i 
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in terms of e and expectations involving Y k i before letting e — > 0. The details 
are omitted. □ 

APPENDIX B: ASYMPTOTIC EXPANSIONS OF THE LOG 

LIKELIHOOD FUNCTION 

For a fixed B G B, let Zi(6,(3) = /3'ui + 0I {x . eB} and let 

£,(0, /?) = -X % log(l + e"*^) - (1 - Xi) log(l + e**^) 

be the log likelihood function corresponding to the ith subject. Since dtijdzi = 
Xi — pi, where pi = (1 + e~ Zi ) , evaluated at some parameter f3 and 9 = 0, 
it follows that 

-JE = { x i- Pi) I {x I eBj and ttt = ~ Pi) for 1 < A; < r. 

at? apfc 

To motivate the form of the limiting distribution of S^ k \B), we use a 
weighted Gram-Schmidt orthogonalization procedure, rather than matrix 
notation, to describe the first-order quadratic term in a Taylor expansion 
of S®(B). Let w i — ~ Pi) and let weighted dot product (a • b)^, — 
SiLi a ibi w i an d norm \\a\\ w = (a • a)^. Define recursively ui = ui and u& = 
u fc - Es=i OfeUs, where a ks = (u fc • Ug^/HSsH^, for 2<k<r. Then (u fc • 
u s )ui = for all k ^ s. Let = (Sifc, . . . ,ujk)- Under sufficient regularity 
conditions, S B is equal to, up to a o(l) term, 



( 2v b) < X. I {x i €B> - — ^r^ — ( Xi -p*)? ' 



(B.l) 



.1=1 \ fc=l ll u fcllw 



where as = (I{xieB} 5 • • • , I{ Xj 6B})' 
and u B = 2^ t»i I{x i£ B} " 2^ ■ 

i=l \ k=l W Uk Ww ) 

We will next consider a characterization of the limiting distributions of 
S^\B), BgB. Let rj(t) = \(t)E(wi\xi = t)/E(wy) and g k (t) = E(u lk wi\y.i -- 
t) / E(w\\x.\ = t) for 1 < k <r and assume that they are positive and con- 
tinuous on D. Let g\ = g\{= 1) and define recursively for k>2, 

k-l 

9k{t) = 9k(t) - Mfes5s(t), 

s=l 

where 

j u fes = 6j 1 / g k (t)g s (t)rj(t) dt and 6 S = / g 2 s (t)r](t) dt. 
Jd Jd 
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Then J D g k (t)g s (t)r](t) dt = for all k^s. Let uj be Gaussian white noise 
on D with lo(B) ~ N(0,t]b), where i]b = Jb 7 !^) dt. Then S^ 2 \B) converges 
weakly to Z B /2, where 



Z B 



B 



^-zZi^ 1 Uk(t)v(t)dt)u k (D) 

k=l 



B 



with uJk{D) = J D g k (t)co(dt) and vb is a normalizing constant to ensure 
Var(Z B ) = l. 

The justification behind (3.6) requires arguments used in the proof of 
Theorem 2. For a given c > and B £ B, let Qb be a probability mea- 
sure such that u(A) ~ N(J A n(t)rj(t) dt, t]a), where /u(t) = c 1 ' 2 v B [I{tem — 
Efc=i7fcB5fc(t)] and -/ kB = b^ 1 J B g k (t)rj(t) dt. Moreover, under Qb, 
and oo{C) are independent whenever ^4 and C are disjoint. By Girsanov's 
theorem (see Chapter 3.5 of [14]), 

r n(t)u(dt) - - I ^ 2 (t)r?(t) dt = c l ' 2 Z B - c/2. 



£(£):= log 



We can then proceed, as in the proof of Theorem 2 in Section 6.2, by ana- 
lyzing the behavior of U z under Qb and using a linearization argument to 
estimate Z B /2 by 1(B) when Zb is close to c 1 / 2 . The details are omitted. 



APPENDIX C: PROOFS OF LEMMA 1, (6.11) AND (6.6) 



Proof of Lemma 1. Let as = n B /J and f(p) = otB&ip) + (1 - «b) x 
< />((i?o ~~ a BP)/(l — «b))- The tangent of / at p = ps is 

gijp) :=a B {6{pB)p + log[(l-p B )/{l - Po)]} 

+ (1 - a B ){0(p B )(po - a B p)/(l ~ a B ) + Iog[(l - p B )/(l - p )]}- 

Since S^>(B) = J /(mg /jib) and ^(-B) = J^m^/ree), Lemma 1(a) follows 
from the convexity of /. 

Next, let if = [ps>p] if < P and K = [p,ps\ if > P- Since f(p B ) = 
9(Pb) and 5 is linear, 



(C.l) 



\f{p)-g{p)\ < sup /"(g) (p-Pb) 2 /2, 



|/(p)-/(PB)|> mf/'(g) 

UjGK 



\P-PB\ 



Select p = tub/tib- Since f(po) = f'(po) = and /(pb) = c(2J) 1 = o(l), it 
follows that /'(p B ) is of order (c/J) 1 / 2 . If J|/(p) - /(p B )|(= 1 5(5) -c/2|) < 
c 1 / 3 , then by the first inequality of (C.l), \p — Pb\ = O^ 1 / 6 J -1 / 2 ). Since 
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\SM(B)-£(B) \ = J\f(p)-g(p)\ and /" is order 1 in K, Lemma 1(b) follows 
from second inequality of (C.l). □ 

Proof of (6.11). The upper bound follows from Z^ + /2 > £(C) for all 

C. Next, observe that the constraints < c + c 1 / 3 and log(#B) = o(c 1 ^ 3 ) 
together imply that sup CgB < c + 2c 1 / 3 for all large c. Since 

my 1 E (^ +/ V + < c - 2cl /3 } ) = o(e^), 

CeB 

the lower bound follows from applying (6.10) on 

(#B)- 1 E( eZS+/2l {|^ + -l<^/3})- 

CeB u 

PROOF of (6.6). Let dB denote the boundary of B and let 

d £ B = {t G D : ||t - u|| < £ for some u £ dB} with e = c~ 3 / 5 . 

Let B A C = {B \ C) U (C \ B) and let Bi={CeB:BACc c\B}, the 
class of C G i3 that are "close" to 5. In Lemma 3(a) below, we show that 
£{C) — £(B) can be approximated by 

h B (C):= J2 [G(Pc)(X i - PB )-9(p c )(X i -p B )} 

XiGC\S 

(C.2) 

- E [%c)(^-p B )-0(pc)PQ-p B )] 

Xi e-B\C 

for all C G B\. We show in Lemma 3(b) that J2c^Bi e^ c ^~ iiyB ^ is asymptoti- 
cally negligible. Hence V B = log(£ CeBl e hs(c) ) +o(l). But log(X) CeBl e hs(c,) ) 
depends only on X B = {(xj, Xj) :xj G d £ B} and because e = o(c~ 1 / 2 ), ^(-B) 
is asymptotically independent of X B . Let 

Y^p 2 (B) = {x:#(a £J B) < ftJe and #(5 AC) > /? 2 Je for all C G D 

Lemma 3. There exists (3\ > iarge enough and 02 > small enough 
such that 

(C.3) 1 - P(T Pl>02 (B)) = o(e" cV3 ) uniformly over BeB. 

Moreover, for fixed Pi > and 02 > 0, i/te following holds uniformly over 
xGL> lA (B). 

(a) If c/2 < ^(B) < c/2 + c 1 / 3 and £f =1 Xi = I, then 

(C.4) £(C)-£(B) = h B (C) + o(l) uniformly over C G B x . 

(b) Lei A(C) = {£{B) > (c/2) and £(C7) > (c/2) - c 1 / 3 }. T/ien 
(C.5) P{A(C)|/,x} = o(e- c / 2 - cl/3 ) uniformly over CeB\B l . 
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Proof. Let a d ~ l (-) denote a (d — 1) -dimensional volume element. By 
(A2), #(d £ B) ~Bin(J,g), where q~e( B and Cb = 2 f 9B A(t)cr d_1 (dt). Then 

(C.6) P{#d £ B > 3( B eJ} = o{e-^ BeJm ) for some r? > 0. 

Similarly, there exists k b > such that for all C £ B \ B\ and J large, 
#(B AC)~ Bin(J, gc), where > £kb- Hence 

(C.7) F{#(BAC)<K B eJ/3} = o(e- KflEj '' 1 ) for some Vl > 0. 

Since eJ/c 1 / 3 — > oo and log(#S) = o(c 1 / 3 ), (C.3) follows from (C.6) and 
(C.7). 

(a) By (6.3) and (6.5), 

£(B) = n B (j)(p B ) + ( J - n B )4>(p B ) 

+ 0(p B ) ( X i ~ Pb) + 0(Pb) E " Pb)> 

x,eB x,^B 

£(C) = n c {pB log(pc/po) + (1 - pb) log[(l - pc)/(l - Po)]} 

+ (J- n c ){p B \og(p c /po) + (l-ps)log[(l -pc)/(l -Po)]} 
+ 0(p o ) E (Xi - PB ) - 9(p c ) E (X, - p B ). 

x i; GC x^C 

If c/2 < £(S) < c/2 + c 1 / 3 , then by (C.2), 

^(C) - ^b(C) = n c {p B log(pc/Po) + (1 - Pb) log[(l - Pc)/(1 - Po)]} 
+ ( J - n c ){pB log(pc/Po) 

+ (l-pB)log[(l-p C )/(l-P0)]} 

+ 0(pcr) J2 ( X i ~ Pb) + 8(Pc) E ( X * " Pb) 

x,eB x^B 

= £(5) + 0(J(p c - pb) 2 ) + 0(c 1/3 |pc - Pb|). 

Under F Pl> p 2 (B), if C G Si, then by (6.3), |pc~Pb| = Ofc 1 / 2 ^ 172 -n~ 1/2 |) = 
OCc^na'- n c | J~ 3 / 2 ) = 0(cV 2 J-V2 e ) an d we conclude (C.4). 

(b) For given po an d x generated according to (A2), let Q B ,c be a prob- 
ability measure under which X\ , . . . , Xj are independent Bernoulli random 
variables satisfying 

QB,c{Xi = 1} = (pbI{ Xi gB} +PbI{ x ^b} +PcI{x lG C} +Pd{x^C})/ 2 - 
By the AM > GM inequality and (6.4), 

(C8) Q B ,c{Xi = a}> (Q B {Xi = a}Q c {Xi = a}) 1 / 2 , a = 0,1. 
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By (6.3) and the identity (x + y)/2- (x 1 / 2 y 1 / 2 ) = (x 1/2 - y 1/2 ) 2 /2, there 
exists 7 > such that whenever Xj £ B A C, 

(C9) QB,c{Xi = a}>e c ^ J (Q B {X l = a}Qc{X l = a}) 1 / 2 , a = 0,1. 
Let C £ B \ Bi. By (C.8), (C.9) and the relation Pp {Y,Li x i = J } ~ 

Q B ,c{Ef=i^ = i}, 

P{A(C)|/,x} < (1 + (l))^ ( B,c)(e- [ ^ )+£(C)1/2 - ( ^ J)(#iJAC) lA(c)|/,x) 

and (C.5) holds because under Fp 1: p 2 (B), #(B A C) > 2 Je for C £ £ \ S x . 
□ 

To show that the second and fifth lines of (6.6) are asymptotically equiv- 
alent, assume without loss of generality that x £ [PlseB^ Pi^i-^)] f° r A > 
and /3 2 > satisfying (C.3). By (6.3), if E x ,gb(^ - Pb) < for all 5 £ B, 
then S^'(B) < c/2 for all B £ /3 and hence C/g < c. The condition Ug > c 
thus ensures that E Xi e.B max pM ~ PB m *J > °> and ; consequently, £(B ma _ x ) > 
c/2 [see (6.5)]. 

Let ft B = {£(5) > c/2 and £(C) < (c/2) - c 1 / 3 for all C £ B \ Bi} and 
W B = log((#i3)- 1 Ecee e£(C ' ) )- % Lemma 2(a) and (C.5), there exists c' = 
c + o(l) such that 

E Q(B){e~ t{B) l{u B >c,B m ^=B}\l ",x) 

(CIO) > ^Q(B)(e" £(jB) I {c , +c i/3> 2H / B > c ',B max =B}l^ J B^ 5 x ) 

+ ((#B)- 1 C- 1 /2 e - C /2). 

Let A# = {(xj,Xj):xj £ <9 £ -B} and assume Qb- Then I{£ max= £} is deter- 
mined on knowing X B , and, in addition, by (C.4), V B = log(Ece£i e. hB ^) + 
o(l). Moreover, by (C.2), log(EceZ?i e hB ^) is determined on knowing X B 
and x. Hence there exists c* = c + o(l) such that 

^Q(B)( e ~ f( ' B)l { c '+c 1 /3>2H/ B > c ',B max =B}I^B^> x ) ^B) 

(C.ll) > (l + o(l))I {Bm «=B } (#B)- 1 e Vfl 

It follows from a local limit theorem that under Qs, Wjg conditioned on 
has an asymptotic density of (2-nc)~ l l 2 uniformly over [c*/2, c*/2 + c 1 / 3 ]. 
Replace this asymptotic density into (C.ll), take expectation over Xb and 
substitute the remaining expression into (C.10) to obtain 

E Q(B) (e^ (B) I{c/ B > C) B max =s} I 1 , x ) 

(C.12) > (1 + o(l))(27rc)- 1/2 e- c/2 

x(#5)- 1 J B Q(B) (e^I {Bmax=B} |/,x) + ((#S)- 1 c- 1 /2 e -/2 ) . 
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Similarly, £ Q(B) ( e -^)l { ^> c+cV 3, Bmax=B} |/,x) = o^B)-^ 1 '^ 2 ) and 
(C.12) with the inequality reversed can be obtained. Hence the second and 
fifth lines of (6.6) are asymptotically equivalent. 
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